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ABSTRACT 

An adaptive iterative decision multi-feedback detection algorithm 
with constellation constraints is proposed for multiuser multi- 
antenna systems. An enhanced detection and interference can- 
cellation is performed by introducing multiple constellation points 
as decision candidates. A complexity reduction strategy is devel- 
oped to avoid redundant processing with reliable decisions along 
with an adaptive recursive least squares algorithm for time-varying 
channels. An iterative detection and decoding scheme is also consid- 
ered with the proposed detection algorithm. Simulations show that 
the proposed technique has a complexity as low as the conventional 
decision feedback detector while it obtains a performance close to 
the maximum likelihood detector. 

Index Terms — MIMO systems, decision feedback receivers, 
RLS algorithms, multi-user detection, iterative processing. 

1. INTRODUCTION 

Multi-user detection (MUD) algorithms have shown that they can 
be applied to 3G and next generation multi-antenna communication 
systems (Tj. As the optimal maximum likelihood detector (MLD) 
has an exponential computational cost in the number of users and 
constellation points, cost-effective solutions such as the sphere de- 
coder (SD) and decision feedback (DF) receivers J2]|3) are preferred 
as they offer an acceptable performance and complexity trade-off in 
spatial multiplexing multi-input multi-output (MIMO) systems. For 
time-varying channels, adaptive DF structures J4] [9] [7] are promis- 
ing as adaptive algorithms can be used to track the channels and to 
avoid excessive computations when the channels are time-varying. 
However, the performance of DF techniques are far from the MLD. 

In this paper, an adaptive decision feedback based algorithm is 
proposed for signal detection in multi-user MIMO (MU-MIMO) sys- 
tems with time-varying channels. The proposed DF algorithm can 
reduce the performance gap between the optimal MLD and existing 
DF algorithms. The proposed DF algorithm exploits multiple con- 
stellation points and orderings to obtain several detection candidates. 
A reliability checking technique called constellation constraint (CC) 
brings improved performance to the proposed DF detector at a small 
additional computational cost as compared to the conventional DF. 
We also consider an iterative detection and decoding (IDD) scheme 
in which the proposed DF detector is incorporated. 

This paper is organized as follows: Section 2 gives the data 
and system model of the MU-MIMO system; the proposed detec- 
tion scheme is described in Section 3, whereas the IDD scheme is 
detailed in Section 4; the simulation results are shown in Section 5 
and Section 6 presents the conclusions of the paper. 



2. DATA AND SYSTEM MODEL 

Let us consider a model of an uplink MU-MIMO system with K 
users. Each user is equipped with a single antenna. At the re- 
ceiver side, Nr receive antennas are available for collecting the 
signals. Throughout this paper, the complex baseband notation is 
used while vectors and matrices are written in lower-case and upper- 
case boldface, respectively. At each time instant [i], K users si- 
multaneously transmit K symbols organized into a vector s[i] = 
[si[i], S2[i], ■•■, sjr[i]] , where (-) T denotes the transpose op- 
eration, and whose entries are chosen from a complex C-ary con- 
stellation set A = {ai, a2, ..., ac}- The symbol vector s[i] 
is transmitted over time- varying channels and the received signal is 
processed by Nr antennas. The received signal is collected to form 
an Nr x 1 vector with sufficient statistics for detection 



= ^2h k [i]a h [i]+v[i\ = H[i]s[i] + v[i 



(1) 



where the Nr x 1 vector v [i] represents a zero mean complex circu- 
lar symmetric Gaussian noise with covariance matrix E [v [i]v H [i]] = 
a^I, g% is the noise variance and I is the identity matrix, E[-] 
stands for the expected value and (-) H denotes the Hermitian oper- 
ator. The symbol vector s [i] has zero mean and a covariance matrix 
fi[s[i]s ff [z]] = a'il, where a 2 s is the signal power. Furthermore, 
the elements of H[i] are the time- varying complex channel gains 
from the nr-th transmit antenna to the riR-th receive antenna, which 
follow the Jakes' model 1 1 31 . The Nr x 1 vector hk[i] includes 
the channel coefficients of user k such that H[i] is formed by the 
channel vectors of all users. As the optimal SINR-based nulling 
and cancellation order (NCO) |4] requires a high computational 
complexity, we determine the NCO by computing the norms of the 
column vectors corresponding to all users and we then detect them 
in decreasing order of their norms. 

3. PROPOSED ADAPTIVE MULTI-USER DF DETECTOR 

In the proposed adaptive multi-user DF detector, called AMUDFCC, 
the received signal r[i] is filtered by a Nr x 1 forward filter uf k [i] 
which acts as the nulling vectors of the V-BLAST algorithm. Then 
for each user stream k — 1, . . . , K , the decisions are accumulated 
and cancelled by the (k 
wf fe [i]. Let s[i] = [' 
symbol vector and Uk [i] denotes the difference between the forward 
filter output and the backward filter output as described as 



1) -dimensional decision backward filter 
si [i], $2 [i], ■ ■ ■ , Sk [i] I represent the detected 



Uk[l\ =Uf,k[ 



■W 6ife [«JSfc_l[lJ 



(2) 



where u^ 1 — for the first user and the (k — 1) -dimensional de- 
tected symbol vector is defined as 



S fe _i[lJ = Sl,S 2 , 



,S*-1 



(3) 



For notational convenience, the feedforward and feedback filters can 
be concatenated together as l4l 



WfeW 



w/,fcW. 
[ u >j,k[ i ]> u b,k[i]] T , 



The input can also be concatenated as 



r k \i\ 



[r T [i\-sU[i\] T , 



k= 1 

fc = 2, . 



fc= 1 
fc = 2, 



,Jf. 



a; 



Then, we can rewrite (0 as 



Hr 



r[,i 



User 1 



WfcW = ^fe WffcW 



User 2 




(4) 



(5) 



(6) 



Fig. 1: Block diagram of the proposed AMUDFCC detector. 

As a result, the structure and the signal processing model of the 
proposed DF detector are depicted in Fig[T] We denote the receive 
filter of each user as Wj. [i] (k — 1, . . . , K), and the value of each en- 
try can be obtained by solving the standard least squares (LS) prob- 
lem. The LS cost function with an exponential window is given by 



*?[i\ = A" 1 *^ - 1] - A-'fcfcWqf \i], (11) 

u h [i\=u k [i-l]+k h \jS\H[i\, (12) 



where 



&[*] = 



sum 

Sk\i] 






l]f k [i\ , Training Mode, 

l]fh [i] , Decision-directed Mode. 



(13) 
As indicated in <13t , this adaptive detection algorithm works in two 
modes. The first one is employed with the training sequence, while 
the second one is the decision-directed mode that is switched on after 
the filter weights converge. In the decision-directed mode the quality 
of the detected symbols has a major impact on the performance of 
adaptive DF algorithms. This is because the detection error of the 
current user may propagate throughout the detection of the following 
users. Moreover, in time-varying channels a poor £j [i] can easily 
damage the uj k [i] in equation J12t resulting in burst errors. 

3.1. Constellation Constraints 

When the filter output Uk [i] is considered unreliable, the CC scheme 
produces a number of selected constellation points as the candidate 
decisions. A selection algorithm is introduced to prevent the search 
space from growing exponentially, saving computational complex- 
ity by avoiding redundant processing with reliable decisions. In 




Fig. 2: The constellation constraints (CC) device. The CC procedure 
invoked as the soft estimates u k [i] drop into the shaded area. 



Jk [i] = ^Z A * T r k M ~~ &* W f fe M 



(7) 



where <C A < 1 is the forgetting factor, the scalar s k [t] denotes 
the detected signal in the time index r or the known pilots where 
s k [r] = Sk [ T ] ■ The optimal tap weight minimizing J k [i] is given by 



w*[i] =*fc 1 WPfcW. 



(8) 



where the time-averaged cross correlation matrix is obtained by 
# fc [i] = 2\X=i ^"^M^f M and & k [0] = 0, the time-averaged 
cross correlation vector is defined by p k [i] — J]J. = 1 \ l ~ T r k [i~]s k [t] . 
Using the recursive least squares (RLS) algorithm Bill , the op- 
timal weights in l[8} can be calculated recursively as follows: 



q k [i\ = *t 1 [i-l]r h [q, 



kh\i] = 



l + \-ir«[i\q k [i\ 



(9) 
(10) 



the decision-directed mode, the concatenated filter output u k [i] is 
checked by the CC device which is illustrated in Fig(2] where a 
threshold dfa is defined which can be either a constant or a linear 
function of a v . The CC device finds the nearest constellation point 
to ut [i] according to 



a k [i] = arg min {\u k [ 

a c GA 



= 1}, 



(14) 



where a c represents all potential constellation points. A decision is 
considered unreliable if at least one of the following conditions holds 



d> d, 



lh 



when 



ReWH}| < ^§ 
Im{u k [i]}\ < % 



|ReW*]}| < % -*, , ]Re{ukm> ^ 

OR when^ Y 

It r nil ^ °s j \lm{u k [i]}\ > 3L 

Im{«fei} < ^-du, I* ' ' v 5 



(15) 



(16) 



where d denotes the distance between the estimated symbol Uk[i] 
and its nearest constellation point a k [i]q Instead of finding the clos- 
est vector, in fact, the scalar constellation helps to reduce the cost. 
Since the CC device distinguishes whether the feedback signal is re- 
liable, the detector maintains its complexity at the same level of the 
conventional DF structure. Once the filter output Uk [i] drops into 
the lighted area of the constellation map, the decision is considered 
reliable and the quantization operation Q(-) is then performed 



Sfcl 



Q(«*[*])- 



(17) 



If Uk [i] drops into the shadowed area, the decision is determined 
unreliable. The CC processing is evoked and a candidate vector is 
generated as C — {ci, ca, . . . , Cm, ■ - ■ , cm} Q A. The candidates 
are constrained by the constellation map and the selected vector is 
a selection of the M nearest constellation points to the Uk [i] ■ The 
size of C can be either fixed or variable, which introduces a trade off 
between the performance and complexity. 

The refined estimate is obtained by §k[i] — c op t where c pt is 
the optimal candidate selected from C. This refined decision will 
produce a more accurate £& [i] which minimizes the mean square er- 
ror (MSE). The benefits offered by the CC algorithm are based on 
the assumption that the optimal feedback candidate Copt is correctly 
selected. This selection algorithm is described as follows: a set of 
tentative decision vectors Bk = {bl, . . . , 6™, . . . , b^. 1 } is defined 
and the number of tentative decision vectors M equal the number 
of selected constellation candidates. Each vector b™ is defined as 
bT [i] = [si[i],..., s k -i[i] , c m , b k +i [i], ■ ■ ■ , b K [i]] , the Kx 1 vec- 
tor bjT 1 consists of: 1) (k — 1) -dimensional detected symbol vector 
«fc_i[i] which is used in ((5); 2) a candidate symbol c m taken from 
C for substituting the unreliable Q(u,k [t] ) of the fc-th data stream; 3) 
by combining 1) and 2) as the previous decisions, the tentative de- 
cisions of the following streams bk+i[i], . . . ,6jf [i] are subsequently 
obtained by the adaptive detector. Let us define the vector with the 
candidate constellation point as 

1 T 



8k, 



«iW 



= [s k _ 1 [i],Cm 



8k-l[i\,Cmj 
1 T 



(18) 
(19) 



Therefore, ((5} turns out to be 

rk+i,m[i] = [r T [i],sl }m [i]] ,k=l,...,K. (20) 



The tentative decision of the (k + 1) stream becomes 
h+i[i] = Q|w^ h i[i]ffe + i, m [»]|. 



(21) 



The CC algorithm selects the best constellation point among M can 
didates according to the maximum likelihood (ML) rule as 

|2 

m op t 



are mm rt 
Km<M I 



Hb 



k w 



(22) 



Then c op t replaces the unreliable decision Uk [i] . The same receive 
filter U3k[i] is use d to process all the candidates, which allows the 
proposed algorithm to have the simplicity of the adaptive DF detec- 
tor. Here we employ an RLS algorithm to estimate the channel II 101 . 



'Equation il 61 defines the shadowed area inside the square obtained by 
connecting the four a c (a c = (±a s /V2,±jas/\/2)). Equation J15I de- 
notes the shadowed area outside the square. This concept can be further 
extended to multi-tier constellations, eg. 16-QAM. 



3.2. Computational Complexity 

Let us define the parameter K = Nr, and M as the number of 
candidates. The numbers of complex multiplications, corresponding 
to the V-BLAST and the DF-RLS, are 2K S + K 2 + K and f K 2 - 
|. respectively. As for the proposed scheme, in the worst casdj, 
it requires M(5/2K 2 — 3/2K) multiplications on top of the DF 
algorithm. The additional complexity is obtained by: 

• If mi is unreliable, we replace Q(wi) with c m , the multiplica- 
tion repeats M times for the different c m . The number of the 
complex multiplication is M x X/fc=i ^ c - 

• If U2 is unreliable, as previously, the number of complex mul- 
tiplications is 1 + M x X/feJi k- 

• If «3 is unreliable, the number of complex multiplications is 
2 + M x Efjx 3 k. 

• By summing across K users we have: ~^2 k=1 (k — 1) + 
M£f = - fc fc. 

The overall additional complexity can be obtained by summing the 
above figures with the complexity required by the ML selection rule 
and the reliability checking algorithm. Moreover, the probability 
of unreliable estimates decreases as the number of users increasesj, 
which leads to the processing of 6.1%, 4.65%, 3.59% on average 
over the users of the estimated symbol for K — 2, 4, 8 users, respec- 
tively. The numerical results suggest that extra computations can be 
further reduced in larger systems where both Nr and K are larger. 

3.3. Multiple-Branch Processing 

In this subsection, the proposed detector is applied with several par- 
allel branches that are equipped with different NCO patterns. Let us 
define s'[i] = Tis[i] = [si.i[i],S2.i[i], ■ ■ ■ ,sk.i] , a permutation 
of the detected symbol set s[i], ordered by the transformation matrix 
Ti,l — 1, . . . , L., where each row and each column of Ti contain 
only one ' 1', We also define Uk,i [i] as the output of the fc-th concate- 
nated filter for the l-th branch which exploits the permutation matrix 
Ti. The detected symbols can be obtained in the original order by 
using si[i] — Tfs'i[i]. The optimal ordering scheme conducts an 
exhaustive search of L = K\. Sub-optimal schemes have been pro- 
posed in 1121 to design the codebook with a reduced L. 

4. ITERATIVE DETECTION AND DECODING 

In the following, a soft-output detector is described to improve the 
performance of the proposed detector in the concatenation with a 
convolutional code. Let bu,j be the j-th bit of the constellation 
symbol and (j — 1,2,..., log 2 C). We denote L[bkj] as the log- 
likelihood ratio (LLR) value for the coded bits bk,j- The extrinsic 
information is obtained by the detector as 11141 
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k,j ' ' 



,P(r|«)exp(/(a))' 



(23) 



"We have the worst case and the best case which means all K decisions 
are considered unreliable and all decisions are reliable, respectively. 
3 This is due to the increased overall detection diversity. 



and Ai.j is the set of all symbol vectors that consist of bits satisfying 
bk,j = 1, ■Ak j is similarly defined but satisfying bk,j = 0. Similar 
to list-SD 1141 . a list of vectors can be found by deploying the pro- 
posed detector, the ML vector can be found as a tentative decision. 
By appropriately selecting the tentative decisions, the AMUDFCC 
detector performance can approach the optimal MLD performance. 
Let B denote the set of tentative decisions obtained from 

jB = BiUB 2 U,...,UBfeU,... ) UBic, (24) 

If L > 1, MB is used and we have 



B = Bi U B 2 U, ..., UBiU, ..., UB L - 



(25) 



When the intersection set is empty, i.e. Ak j n B = or A% j flB = 
the LLR for that specific bit can be filled with an arbitrary 
number with a large magnitude. The probability density can be 
obtained by P(r|s) oc exp ( — -^||r — Hs\\ 2 ) , where f(s) = 

^(2bT k i — l)L\b£ ■ ], where b[k t j\ is the vector of all bits without 
the j'-th bit from the fc-th symbol, and similarly for the L-vector. 



5. SIMULATION RESULTS 

In this section, simulations are presented to demonstrate the sys- 
tem performance of the proposed AMUDFCC detection algorithm. 
We consider time-varying fading channels and QPSK modulation. 
The transmitted vectors s[i] are grouped into frames of 500 symbol 
vectors where the first 10 symbol vectors are training data and the 
column-norm based ordering described in Section |2]is employed. 




Number of users K = N 




Number of users K 



Fig. 3: Performance with E b /N = 13 dB, AMUDFCC with <%, = 0.5 
and LS channel estimation, (a) AMUDFCC has a superior performance to the 
conventional DF scheme and is not far from the MLD performance obtained 
with the SD. (b) The AMUDFCC has a similar cost to the conventional DF. 

In Fig[5Ja), it is shown the BER performance against the num- 
ber of users assuming Nr — KNt for a block fading channel. 
The BER performances of all schemes improve while the number 
of receive antenna Nr grows with the number of users K. More 
importantly, the proposed detector offers a significant performance 
gain over the DF-RLS detector at a small extra computational cost as 
shown in Fig[3lb). By adding more complexity, the performance can 



be further improved by introducing L parallel branches. The com- 
putational complexity is shown in terms of floating-point operations 
(FLOPS) per symbol detection. |j 
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Fig. 4: MSE of the estimated symbols in terms of RLS iterations, with 4 
users. After 10 training vectors, the decision-directed mode is switched on. 



Fig. [4] illustrates the MSE for the symbol estimation across all 
4 users in terms of RLS iterations. The channel between a trans- 
mit and receive antenna pair follows Jakes' model 1151 . Here, we 
have Eb/No = 14 dB and the normalized Doppler frequency shift 
equals 10~ 2,5 , 10~ 2 - 75 and 10~ 3 , respectively. It is clear that the 
AMUDFCC-RLS considerably reduces the MSE level when com- 
pared to DF-RLS. For a coded system with RLS channel estimation, 
the BER performance against the average SNR across all users is 
shown in Fig(5] The curves show that the proposed AMUDFCC 
detector has a substantial performance gain as compared to the con- 
ventional DF scheme. By increasing the number of branches with 
different NCO, the SD performance can be approached. 




- DF— RLS 

. AMUDFCC— RLS (M=4 H^, 

. AMUDFCC— RLS (M=4 L=4 H 

-SD(H ) 



E h /N [dB] 

Fig. 5: K = 6 users are separately coded by the g = (7, 5) , rate R = 1/2, 
memory 2 convolutional code and we use the block size equals 500 vectors, 
M = 4 candidates and d t h = 0-5- The number of turbo iterations between 
the detector and the decoder is 3. 



4 The FLOPS were counted by the Lightspeed toolbox (BJ. The FLOPS 
count as 2 for a complex addition and as 6 for a complex multiplication. 



6. CONCLUSIONS 

In this paper, we have developed an adaptive iterative decision feed- 
back based detector for MU-MIMO systems in time-varying chan- 
nel. The proposed scheme is able to approach the optimal MLD per- 
formance while requiring a significantly lower computational cost. 
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